home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
MIDICraft's MIDINET CD-ROM
/
MIDICraft's MIDINET CD-ROM.iso
/
DOSUTILS
/
FILEDB.DOC
< prev
next >
Wrap
Text File
|
1997-01-18
|
8KB
|
216 lines
******************************
FILEDB v1.1
keep knowledge about already seen files
by Guenter Nagler
1996
(gnagler@ihm.tu-graz.ac.at)
******************************
[1] BACKGROUND
When downloading files from newsgroups or ftp sites it often happened
that I got same files from different locations and didn't notice it before
looking at them (e.g. hearing a sound file or displaying a picture or starting
a program). That costs much time when downloading frequently.
First I wrote a program FINDDBL that compared the files with those I have
already picked. This helped because "good" files are stored on different sites
and reposted often.
But I found that I spent much time with "bad" duplicates or reposts that
I already deleted.
So I wrote this program to save time by keeping knowledge about already
seen files in a small database and automatically find the duplicates.
Sure, I know that files are real duplicates only if you have compared
against the original, but keeping the "bad" original files is a bad solution.
This program uses only 62 bytes to identify a file duplicate:
- 4 bytes: file size
- 4 bytes: checksum of whole file
- 42 bytes: data from the original file picked up from different file positions
- 12 bytes: filename of the original file without path (dos compatible)
If the filename is not exactly the same, the program only suggests
that it probably may be a duplicate file.
Experiences have shown that this assumption identifies file duplicates,
and only very constructed file changes may cause wrong identification.
Normally edited versions of a file lead to new checksum or file size.
[2] FILES DESCRIPTION
FILEDB.EXE..........MSDOS executable
FILEDB.DOC..........this file, showing usage of FILEDB.EXE
only FILEDB.EXE is required to run program
[3] COPYRIGHT
FILEDB (c) 1996 was created by Guenter Nagler.
FILEDB is free and may be used as you wish with this one exception:
You may NOT charge any fee or derive any profit for distribution
of FILEDB. Thus, you may NOT sell or bundle FILEDB with any
product in a retail environment (shareware disk distribution, CD-ROM,
etc.) without permission of the author.
You may give FILEDB to your friends, upload it to a BBS, or ftp it to
another internet site, as long as you don't charge anything for it.
[4] DISCLAIMER
FILEDB handles data base files that keep knowledge about already seen
files and gives answer about having seen files.
Use FILEDB at your own risk. Anything you do with FILEDB is your
responsibility, and not the author's. Any damage caused to any person,
computer, software, hardware, company, or business by running FILEDB
is your responsibility, and the author will not be liable.
If you don't understand these terms, or are not sure of something, or
are afraid something bad might come of using FILEDB, don't use it!
You are here forewarned.
[5] INSTALLATION
Simply copy FILEDB.EXE in a directory that is in your path.
Call it in a batch file to use knowledge databases with certain path and name
e.g. mididup.bat:
ECHO OFF
FILEDB -f c:\midifile.db %1 %2 %3 %4 %5 %6 %7 %8 %9
FILEDB will use FILEKNOW.DB in current directory as default database (that
is empty at beginning).
[6] USAGE
usage: filedb [-a] [-c] [-del] [-f name] [-r name] [-s] [-o [#]] [-R] [-dir] filemask ...
-a add files
-c count files
-f use other file as database (default: fileknow.db)
-r remove knowledge about file
-del delete files that are already known
-s affects matching files in subdirectories
-o [#] optimize database at given level (reducing less used entries)
-dir list matching files
-R repair corrupt database (save before!)
The parameters in brackets [ ] are optional.
FILEDB needs one or more existing filenames as parameters (Wildcards *, ? are allowed)
FILEDB supports following commands:
* default command is asking database if the filenames are known
e.g. filedb *.bmp
* adding files to the knowledge database (-a)
e.g. filedb -a *.bmp
* get count of known files (now filename needed)
filedb -c
* use certain file on certain location as database
e.g. filedb -f c:\pictures\pictures.db
FILEDB uses the file .\fileknow.db as database.
Warning: The file will be changed. The file should be writable.
Always create a new database when starting certain file knowledge.
* remove known file information to given filenames
e.g. filedb -r filename.ext
* delete already seen files (duplicates)
e.g. filedb -del *.*
only the files that are already known (read assumption in [1])
are deleted.
The program asks for y/n if the duplicate has other filename.
* perform action also with matching files in subdirectories
e.g. filedb -s *.exe
This will ask database if all *.exe files in current directory and subdirecties
are known.
* optimize database
if database becomes too large and you want to make it smaller then
you could shrink it by deleting all file information that was added
and never seen as duplicate
e.g. filedb -o 1
This command does recognize any filename parameters.
* list content of a file knowledge database
e.g. filedb -dir *.*
This command does not check if the given filename parameters are
existing, it lists the information when a matching filename is found
in the database.
Its comparable to DOS DIR command or UNIX ls.
* repair a database that is damaged.
If harddisk failure damaged part of the database the command
trys to collect the remaining valid file information and generates
a repaired database. Hope you will not need this command.
e.g. filedb -R
You can get the program version by hidden option -verbose .
[7] EXAMPLE
command> uex *.uh
uex reads all files matching *.uh and extracts binary data from encoded
articles.
command> cd result
directory result contains all successful output files.
command> play file.mid
look at the files in this directory
copy good files into other directory
command> copy good.mid g:\archive\midi
and FILEDB the file in result
command> FILEDB good.mid
FILEDBed in GOOD1.UH
FILEDBed in GOOD2.UH
FILEDBed in GOOD3.UH
good.mid deleted
GOOD*.UH remain but only contain the mail header, description
Create a batch file to simplify the use for midi database:
File mididb.bat:
@echo off
filedb -f c:\midi.db %1 %2 %3 %4 %5 %6 %7 %8 %9
command>mididb *.mid
Create a batch file to simplify deleting duplicate midi files in current directory:
File delmidi.bat:
@echo off
filedb -f c:\midi.db -del *.mid
command> delmidi
All files not seen till now remain in the directory.
The remaining files are not registered as seen in the midi.db database.
To register seen you also need to specify option -a (add).
Filedb works with any data type (not limited to midi or a sound format).
[8] SUGGESTIONS / COMMENTS / BUG REPORTS
WWW: http://hgiicm.tu-graz.ac.at/Cpub
EMAIL: gnagler@ihm.tu-graz.ac.at
[9] CHANGES
v1.0 to v1.1:
* added delete command a "delete all" when user asked if file should be
deleted? After pressing key "a" all other files that are known are automatically
deleted without asking user again.
* bug fixed when using options -del and -a at same time
(first deleted file if seen and then tried to add the deleted file)
New behavior:
-del deletes file if already seen (asks user in case of filename is new)
-a adds file if not seen yet (otherwise tells already seen info)
-del -a deletes file if already seen, otherwise adds file without deleting file